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First released in 1991 with the name MaizeDB, the Maize Genetics and Genomics Database, now MaizeGDB, celebrates its 
20th anniversary this year. MaizeGDB has transitioned from a focus on comprehensive curation of the literature, genetic 
maps and stocks to a paradigm that accommodates the recent release of a reference maize genome sequence, multiple 
diverse maize genomes and sequence-based gene expression data sets. The MaizeGDB Team is relatively small, and relies 
heavily on the research community to provide data, nomenclature standards and most importantly, to recommend future 
directions, priorities and strategies. Key aspects of MaizeGDB's intimate interaction with the community are the co-location 
of curators with maize research groups in multiple locations across the USA as well as coordination with MaizeGDB's close 
partner, the Maize Genetics Cooperation — Stock Center. In this report, we describe how the MaizeGDB Team currently 
interacts with the maize research community and our plan for future interactions that will support updates to the func- 
tional and structural annotation of the B73 reference genome. 



A brief history of MaizeDB 

In maize, the release of the B73 reference genome se- 
quence (1), coupled with advances in sequencing technolo- 
gies, continues to generate massive quantities of data. 
MaizeGDB, the maize research community's Model 
Organism Database (MOD) is charged to integrate much 
of these data and to provide an access point that serves 
data representations [e.g. the MaizeGDB Genome 
Browser (2), descriptions of loci and gene models, etc.] to 
external data resources with connections to physical enti- 
ties such as genetic stocks. In addition, the Maize Genome 
Sequencing Consortium has requested that MaizeGDB col- 
lect, document and disseminate researcher-contributed in- 
formation to aid in continued genome assembly and 
annotation endeavors. 

In MaizeGDB's initial incarnation (as MaizeDB; 1990-2000), 
comprehensive literature annotation was one of the central 



foci of the database resource with the main goal being to 
capture experimentally confirmed gene functions and trait 
inheritance, along with extensive community data docu- 
menting genetic maps and molecular markers (3; aka map- 
ping probes). Currently, the curation focus is to facilitate 
data integration of very large data sets and to provide in- 
sight into development of easy-to-use interfaces and data 
displays. The efforts toward data integration involve gene 
nomenclature considerations as well as ontology devel- 
opment and implementation. Literature annotation is 
an ongoing process, but at a greatly reduced amount per 
guidance provided by the 2004 MaizeDB to MaizeGDB 
Transition Steering Committee, http://www.maizegdb.org/ 
steering_committee.php. Priority is given to manuscripts 
suggested by the MaizeGDB Editorial Board (approximately 
five journal articles per month; http://www.maizegdb.org/ 
cgi-bin/editorial_board.cgi) and, as time permits, to newly 
sequenced genes with experimentally confirmed functions. 
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Although a comprehensive set of tools that permits data 
curation by the maize community has been available for 
many years (4), it has been very rarely utilized; a problem 
consistently reported by other genome database projects 
(5). In an effort to increase literature curation, implemen- 
tation of Textpresso (6) is currently underway. Textpresso 
will serve both as a tool to facilitate curational activities, as 
well as a mechanism for researchers to search publications 
based on biological categories such Gene Ontology terms 
(7), http://www.geneontology.org/. Additionally, we have 
partnered with a research team at Truman State 
University. This group, comprised of faculty in biology and 
computer science, engages undergraduates to perform 
Gene Ontology annotation of maize gene models, http:// 
sam.truman.edu/ (8). In cases where there is experimental 
data in the published literature, these annotations will be 
reviewed, incorporated into MaizeGDB and provided to 
Uniprot/SwissProt, which links to MaizeGDB. We envision 
this to become a model for similar community-based anno- 
tation efforts. 

MaizeGDB and the community 

The permanent staff for MaizeGDB is relatively small (five 
persons), and we rely heavily on the maize genetics com- 
munity and other stakeholders, notably the Maize Genetics 
Executive Committee, http://www.maizegdb.org/mgec.php, 
the National Corn Growers Association (9) and the 
MaizeGDB Working Group (described below) for guidance 
and assistance. The maize community has a long history of 
cooperation that dates to the early part of the 20th century 
(10) and enthusiastically participates in community surveys 
to help in guiding MaizeGDB to address stakeholder needs 
(2). The maize community has been a major contributor to 
the NSF's Plant Genome Research Program's accomplish- 
ments from that program's very beginnings, including 
laying substantial groundwork early on toward sequencing 
the maize genome (11). While community planning discus- 
sions are held in open forums at the Annual Maize Genetics 
Conference (this year in its 53rd year), smaller, more 
focused meetings are also held to identify community 
needs and set priorities. The most recent occurred in 2007 
at Allerton Park, Illinois, http://www.maizegdb.org/ 
AllertonReport.doc. Allerton has historic interest, as it was 
also the site of the first maize meeting (Figure 1). At 
MaizeGDB, we provide extensive outreach and other ser- 
vices to support continued development of the maize com- 
munity, which are described in previous reports (4, 12-14) 
and in the accompanying Database (Oxford) article 
(Harper et al.) 

One way that we are able to enhance our interactions 
with the maize community is by the physical distribution of 
the MaizeGDB curation staff. The software development 
and database management team members are centralized 



at Ames, Iowa (USA), in a building where several other 
database groups are also located: SoyBase (15), http:// 
www.soybase.org; PLEXdb (16), http://www.plexdb.org; 
and PlantGDB (17), http://www.plantgdb.org). Ames, IA is 
also the home of maize researchers working in various 
areas of research (18-25) and the North Central Regional 
Plant Introduction Station where maize breeding stocks are 
maintained as a part of the National Plant Germplasm 
System, http://www.ars.usda.gov/Main/site_main.htm? 
modecode=36-25-12-00. MaizeGDB curators, who also 
serve as outreach staff, (J.M.G., L.C.H. and M.L.S.) are 
located in areas where many maize geneticists are sta- 
tioned: at the USDA-ARS Plant Gene Expression Center 
near the University of California, Berkeley (26-29), at the 
University of Arizona (30-33) and at the USDA-ARS/ 
University of Missouri, Columbia (34-41). A closely related 
and collaborative project that also serves the maize genet- 
ics research community is the Maize Genetics Cooperation- 
Stock Center (42), http://maizecoop.cropsci.uiuc.edu, 
located at Urbana, Illinois. The Stock Center uses 
MaizeGDB as the primary interface to data about mutants, 
stocks and phenotypes, provides physical materials to re- 
searchers, and has close contact with other maize research 
groups in that area (43,44). Curators interact closely among 
themselves by frequent email and phone calls, conference 
calls, in-person meetings and postings to a wiki for the 
MaizeGDB team. 

In addition to close contact with the maize researchers, 
members of the MaizeGDB team are, or have been, part 
of the elected Maize Genetics Executive Committee 
(C.J.L. and M.L.S.); the Maize Genetics Nomenclature 
Committee (M.L.S.); the Maize Meeting Steering 
Committee (C.M.A. and M.L.S.); the Editorial Board the 
Maize Genetics Cooperation — Newsletter (M.L.S.); and the 
Corn Germplasm Committee (CGC), http://www.ars-grin. 
gov/npgs/cgclist.html (C.J.L. and M.L.S.). MaizeGDB hosts 
community websites, http://www.maizegdb.org/cooper- 
ators.php, for all of these groups, save the CGC, so that 
activities can be made accessible for public consumption. 
This enables members of the MaizeGDB Team to respond 
appropriately and quickly to community needs and to 
function as a clearinghouse for maize data, as well as the 
enforcement of proper nomenclature. 

Formal interaction with the community occurs via our 
working group (WG). Established in 2006, the MaizeGDB 
Working Group is tasked with evaluating MaizeGDB's cur- 
rent status and recommending a course of action that will 
ensure that the MaizeGDB project follows the trajectory of 
maize research as closely as possible, providing a robust and 
timely source of data and analysis tools. The WG is com- 
posed of 10-12 members of the maize community, who are 
active in diverse areas of research and generally serve a 
term of 3 years (see 'Acknowledgments' section for current 
members of the WG). The WG meets at least once a year, 
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Figure 1. The maize genetics cooperation, 1928-2009. The top timeline depicts the establishment of a Stock Center and 
Newsletter in 1928, followed by funding for MaizeDB in 1990, and then the 2009 release of the B73 reference genome sequence. 
The leftmost inset shows the participants of the first formal Maize Genetics Meeting, at Allerton, IL in 1959 [photograph courtesy 
Earl Patterson and the Newsletter (57)]. The color photograph [courtesy anonymous photographer, and the Newsletter (58) 
depicts the subset of the current maize research community that convened in 2007, at Allerton, IL to plan infrastructure needs. 
This 2007 meeting included representatives of MaizeGDB (CJL, MLS, Trent Seigfried). 



preferably in person at a national meeting, or when circum- 
stances/schedules do not permit, via online conferencing. 
After a brief presentation by the MaizeGDB staff describing 
recent accomplishments, the WG provides guidance on 
issues of special concern to MaizeGDB. Since its creation, 
the WG has provided critical guidance on topics including 
but not limited to: content development for the supporting 
2007-2012 USDA-ARS Project Plan, selection of appropriate 
mechanisms to visualize and interact with the maize 
genome sequence, outlining a specific role for MaizeGDB 
in hosting maize genome sequence assemblies as well as 
structural and functional annotations, user interface up- 
dates, sequence-based expression data representations 
and opportunities for interactions with other projects. 

Genome annotation in the past 

Over the years, about 6000 functional genes described in 
the literature have been curated into MaizeGDB. However, 



this count is dwarfed by the 32 540 gene models predicted 
for the B73 genome by the Maize Genome Sequencing 
Consortium (1), the 25 703 RefSeq cDNA-unigenes in 
GenBank (45), and the approximately 10 000 gene models 
currently thought not to exist in B73, but to be present in 
other inbred lines (35,46). These new loci (predicted gene 
models) can be integrated with the existing functionally 
defined genes in many different ways. One way to inte- 
grate these data is to build a tool that relies on molecular 
markers often sequenced, which are shared between the 
genetic and physical maps and documented at MaizeGDB. 
Based on the requests from our users, we have created the 
'Locus Lookup Tool', http://www.maizegdb.org/cgi-bin/ 
locus_lookup.cgi?id=, to implement this idea. Locus 
Lookup helps researchers with genetically mapped genes 
to identify the chromosomal window that contains their 
gene of interest (47). This tool can aid positional cloning 
efforts and ultimately connects a theoretical gene model 
with a biologically defined gene. It is currently one of our 
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most popular tools to access maize data. In other cases, 
cDNA sequences aligned to the genome assembly are also 
assigned to locus variations by manual curation, which can 
be used to link classical genetic information with the 
genome sequence. For example, the well-studied b1 gene 
has a manually curated cDNA sequence accession, X57276, 
associated with the mutant allele, Bl-Peru, and has been 
aligned by the PlantGDB pipeline to the assembled 
B73 genome: http://www.maizegdb.org/cgi-bin/display 
seqrecord.cgi?id=X57276 (17). Our manual curation of se- 
quence accessions is periodically shared with the NCBI and 
can be found along with many gene and marker names and 
synonyms on the NCBI gene records; for example, see the 
b1 gene record: http://www.ncbi.nlm.nih.gov/gene/542724. 

It is useful to understand the infrastructure that has been 
put in place or is under development at MaizeGDB to sup- 
port annotation. The MaizeGDB Genome Browser is the 
centerpiece for MaizeGDB's transition to a sequence-centric 
paradigm (48). Within the Genome Browser, tracks 
(Table 1) are displayed with gene model information sup- 
plied by both the Maize Genome Sequencing Consortium, 
maizesequence.org and PlantGDB/ZmGDB, http://www 
.plantgdb.org/ZmGDB (17). In addition, we also accept 
data sets from community members that are mapped to 
the genomic sequence and serve them as genome browser 
tracks. When requested, we allow community-added infor- 
mation to be submitted but not displayed at MaizeGDB 
before publication. Glyphs displayed within tracks on the 
MaizeGDB Genome Browser link to additional information, 
stored at MaizeGDB and/or offsite. 



In some cases when the data need to be recomputed on 
an updated genome sequence assembly, we ask for action 
on the part of the original submitter. One of these, 
PlantGDB, serves community annotations via Distributed 
Annotation Service (DAS) wherein the changes to gene 
models are stored at PlantGDB, and DAS web services are 
used to serve the data for display within the context of the 
MaizeGDB Genome Browser. We have created mechanisms 
to recompute the alignment of genomic features that 
we visualize in the tracks when new versions of the assem- 
blies (i.e. pseudomolecules) become available. Feature 
annotations at MaizeGDB are often provided in files asso- 
ciated with peer-reviewed publications, usually as 
Supplementary Data, where the annotation process is 
described. In many of these cases, the underlying support- 
ing data used to create each browser track have been inte- 
grated systematically at MaizeGDB [e.g. UniformMu (49), 
ISU IBM 2009 (50), IBM2 2008 Neighbors (33) and can be 
used for updating annotation of tracks in future. 

More recently, we have begun a partnership with 
PLEXdb to update gene expression data from NimbleGen 
arrays used to develop a maize gene atlas (51). MaizeGDB 
will redefine oligo probe sets for updates to gene models; 
and PLEXdb will re-normalize the raw expression data for 
the updated probe sets. Updated associations of gene 
models to standard ontologies for plant anatomy and de- 
velopment will be supplied to the Plant Ontology site, 
www.plantontology.org, where we have contributed to 
the initial development of this ontology, and have provided 
associations as part of our normal operating protocols for 
some years (52). 



Table 1. Community provided annotation tracks for the MaizeGDB Genome Browser 



Tracks 3 



Source" 3 



External feature link c 



Ac/Ds 

UniformMU 

ISU IBM 2009 

IBM2 2008 Neighbors 

Centromere 

antiCENH3-ChiP 

MIPS repeats, Gene models 

PLEXdb 



Brutnell & Vollbrecht, via PlantGDB (23) PlantGDB.org 
McCarty (49) 
Schnable (50) 

Arizona Genomics Institute (33) 
J.M. Jiang & G. Presting, unpublished data (55) 
J.M. Jiang & G. Presting, unpublished data (55) 
Maize Genome Sequencing Consortium (1) 
PLEXdb (16) 

Structural annotation-community PlantGDB (yrGate, 54) 

cDNA, EST,GSS, unique transcripts,PlantGDB (17) 
GSS, gene models 

MAGI P. Schnable (56) 

Mo17 SNP/lndel D.M. Rokshar (unpublished data) http://www.jgi.doe.gov/ 



maizesequence.org 
PLEXdb.org 
PlantGDB.org 
PlantGDB.org 

magi.plantgenomics.iastate.edu 



Leaf transcriptome 



Brutnell (57) 



cbsuss03.tc.cornell.edu/cgi-bin/gbrowse/c3c4_pm 



a Content of browser tracks supplied by community. 

b Person or group providing data with literature reference for data source in parentheses. 

c The external data source for those cases where clicking on a track feature leads directly to an external data source. 
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The creation of B73 RefGen_v3 is in the works with a 
current estimate for its delivery slated for July of 2011 
(Doreen H. Ware, personal communication). This version is 
anticipated to be the final assembly product to be provided 
by the Maize Genome Sequencing Consortium, which will 
be made available via MaizeGDB. Thereafter, the default 
view of the MaizeGDB Genome Browser is scheduled to be 
updated annually to an incremented release of the genome 
assembly by the MaizeGDB Team. FTP and BLAST (53) access 
to builds will be made available in advance of releasing 
new assemblies to allow research groups to align their 
sequence-indexed data well in advance of their full deploy- 
ment via the MaizeGDB Genome Browser. Timely access is 
important for our community annotators (Table 1), but also 
to many other researchers. For example, in microarray and 
RNA_seq expression data, expression values are normalized 
over values for oligos corresponding to the current gene 
models. MaizeGDB currently accepts data aligned to the 
B73 Reference Genome Assembly (currently B73 
RefGen_v2) and is committed to working with members 
of the community to integrate well-documented structural 
and functional genome annotation and to release new 
assemblies annually. 

Genome annotation in the future 

From community surveys by the Maize Genetics Executive 
Committee and community discussions at the 2010 Maize 
Genetics Conference, the maize community has identified, 
as its two top priorities, to have the maize B73 sequence 
assembly improved and annotated. At the request of the 
Maize Genetics Executive Committee, we are in the process 
of forming a Maize Genome Annotation Consortium. The 
MaizeGDB Team appreciates the importance of helping 
people work together toward a common goal and has re- 
cently posted guidelines on its website, http://www 
.maizegdb.org/assembly.php, to groups who plan to 
engage in sequence assembly and annotation. 

These guidelines stipulate several elements for a fully 
successful collaboration with the maize community and 
with the MaizeGDB, the ultimate disseminator of the infor- 
mation. A key element is transparency, whereby projects 
make public their plans, standards and data delivery time- 
lines at the initial stages. As data are delivered, detailed 
documentation for assembly and annotation should be pro- 
vided. Annotations confirmed by experimental evidence 
should be clearly discriminated from those based purely 
on in silico analyses. Standard evidence codes, e.g. those 
stipulated for GO annotations, should be employed. To fa- 
cilitate data integration at MaizeGDB, and also planning by 
all in the research community, data delivery dates should 
be made known in advance, although it is understood that 
there may be some changes from the initial timeline. 
MaizeGDB will create a mechanism to display progress 



with the main idea to facilitate communication with the 
maize research community. 

In addition, there should be a mechanism to interact 
with the maize community directly and with a single 
voice for the project. Maize researchers comprise a vibrant 
community with researchers at all levels in both the public 
and private sectors and as such an annotation project 
means different things to different people. A bidirectional 
means of communicating with the maize community 
should be deployed at the start of the project so that the 
maize community can both absorb and respond to new 
annotation information quickly. The goal is to provide all 
community members with the same information at the 
same time so that they can plan their research activities 
accordingly. This can be accomplished in many ways 
(quarterly e-newsletters, FAQs, blogs, social media, confer- 
ences, etc.) and all options should be considered so as to 
reach the largest number of stakeholders (i.e. the maize 
researchers). 

Another key element is to capture genome assembly 
information from the community. Currently, individual 
researchers are generating excellent, lab-validated genetic 
markers and order/orientation information of sequence 
fragments within BAC's. Researchers are usually willing 
to share this information freely, but currently, there is 
no robust means to capture it. There should be an easy 
way through a web interface for researchers to submit 
data. All annotation submitted by community members 
should be vetted manually by expert annotators, and sub- 
sequently incorporated into the assembly, with an indica- 
tion of who provided the data. It is expected that while 
there will be comparatively little data entering the assem- 
bly process in this way, these data would be of very high 
quality. 

As with genome assembly, researchers currently have 
high-quality structural and functional annotation for their 
genes of interest, both stored on lab computers and docu- 
mented in publications. Researchers should be provided 
with tools to improve structural and functional annotation 
information that can then be integrated into the larger 
project's outcomes. These same tools could also be lever- 
aged for classroom teaching. The ZmGDB/PlantGDB yrGATE 
(54) system and iPlant's DNA Subway, http://dnasubway 
.iplantcollaborative.org, are good examples of the sort of 
interface that could serve both groups. 

We strongly encourage planning a workshop for educa- 
tion, outreach and training in all aspects of annotation and 
thereby increase community understanding and involve- 
ment in the annotation of the maize genome. One obvious 
way to involve the community would be to contact the 
Maize Genetics Conference Steering Committee, http:// 
www.maizegdb.org/maize_meeting/, about getting your 
message out at the Maize Meeting. 
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Summary 

The maize community is both excited about and prepared 
to begin work toward annotating the maize genome, both 
in B73 and in other inbred lines (34, 46). We at MaizeGDB 
are looking forward to continuing our partnership with the 
community to provide informatics and organizational sup- 
port for ongoing research activities. 
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